Unsupervised Machine Learning With 
Python: Clustering. K-Means Clustering 


The next few posts that we look at will explain a few of the many various clustering algorithms that 
are available for us to use in Python Programming Language. We will not be going into much detail 
for now as these are the first posts | am writing about each of these topics. As we gradually build our 
collection of posts, we will dive in-depth into each of these interesting algorithms. As of now, it is 
important that you understand the overall logic and process behind each of these algorithms. 
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The K-Means Clustering Algorithm 


One of the popular strategies for clustering the data is K-means clustering. It is necessary to presume 
how many clusters there are. Flat clustering is another name for this. An iterative clustering approach 
is used. For this algorithm, the steps listed below must be followed. 


PHASE 1: SELECT THE NUMBER OF CLUSTERS 


The required number of K clusters must be specified. 


PHASE 2: ASSIGN DATA POINTS TO, AND ADJUST CLUSTERS 
[ITERATIVE PHASE] 


Each data point is randomly assigned to a cluster after determining the number of clusters. Or, to put 
it another way, we must group our data according to the number of clusters. 


Cluster centroids should be calculated in this step. 
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Since this is an iterative procedure, we must change K centroids’ locations after each iteration until 
we locate the global optima, or, to put it another way, the centroids are in their ideal positions. 


The K-means clustering technique can be implemented in Python with the aid of the following code. 
Utilizing the Scikit-learn module will be our approach, and this is one of the most popular machine 
learning frameworks in present times. 


Clustering Example 


We begin by importing the necessary packages into our script instance as follows: 


import numpy as np 


import matplotlib.pyplot as plt 


from sklearn.cluster import KMeans 
from sklearn.datasets import make_blobs 
import seaborn as sns 


sns.set() 


The make_blobs() function from the sklearn.datasets package is used to create the two-dimensional 
dataset with four blobs in the following line of code. It allows us to create dummy data points in the 
form of clusters. 


X, y_true = make _blobs(n_samples = 500, 
centers = 4, 
cluster_std = 0.40, 
random_state 


Finally, to obtain insight into the clusters that have been created for us, we may proceed to visualize 
the model using MatPlotLib package. 


plt.title("Scatter Plot Showing K-Means Cluster Groups”) 
plt.xlabel("X-AXIS") 


plt.ylabel("Y-AXIS") 
plt.scatter(X[:, @], X[:, 1], s = 50) 
plt.show() 


The output for the above visualization code shows as follows: 


Scatter Plot Showing K-Means Cluster Groups 
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Now that we have a dataset, we may proceed to train a K-Means clustering model on our dummy 
data. We will instantiate an object of the KMeans class as follows: 


algorithm = KMeans(n_clusters=4) 


Next, we may train our model by utilizing the .fit() method on the KMeans object. We pass the input 
data as a parameter to the algorithm to train a model: 


model = algorithm. fit(X) 


Next, we may obtain the predicted cluster to which each record (observation/row) supposedly 
belongs: 


cluster_predictions = model.predict(X) 


We are able to obtain the center (a data point {x, y} for the center) of each distinct cluster group: 


centers = model.cluster_centers_ 


Finally, we may visualize the KMeans model using MatPlotLib as follows: 


plt.title("Scatter Plot Showing K-Means Cluster Groups”) 
plt.xlabel("X-AXIS") 

plt.ylabel ("Y-AXIS") 

plt.scatter(X[:, 0], X[:, 1], c = cluster_predictions, s = 50, cmap = 
"viridis') 

plt.scatter(centers[:, @], centers[:, 1], c = "black', s = 200, alpha = 
@.5); 

plt.show() 


The visualization will show as follows: 


Scatter Plot Showing K-Means Cluster Groups 
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